library(tsibble)
library(fable)
library(tsibbledata)library(tidyverse)
library(lubridate)
library(ggplot2)
library(urca)
library(feasts)# Create a tsibble object from scratch
y <- tsibble(
mth = rep(yearmonth("2010 Jan") + 0:8, each = 3),
a_observation = rep(c("x", "y", "z"), each = 9),
b_observation = rep(letters[1:3], times = 9),
value = rnorm(27),
key = c(a_observation, b_observation)
)
yWhat are implicit missing values and how are they different from explicit missing values?
Implicit missing values represent a gap in the observations based on the time series index. Many operations assume that an intact vector input ordered in time, and thus, dealing with implicit missing values becomes relevant.
Tsibble provides 4 functions that help to understand and deal with
implicit missing values. - (1) has_gaps() checks if there
are implicit values missing - (2) scan_gaps() reports all
missing values - (3) count_gaps() sums up the time ranges that are
absent from the data - (4) fill_gaps() turns implicit missing values
into explicit ones, along with imputing values by values or
functions
The “pedestrian” dataset contains the hourly pedestrian counts from 2015-01-01 to 2016-12-31 at 4 sensors in the city of Melbourne.
df_pedestrian <- as.data.frame(pedestrian)
df_pedestriantsbl_pedestrian <- df_pedestrian %>%
as_tsibble(key = Sensor, index = Date_Time)
tsbl_pedestrianhas_gaps(tsbl_pedestrian, .full = TRUE)As we can see, each sensor has gaps in time. Lets create a list of all implicit missing values
scan_gaps(tsbl_pedestrian, .full = TRUE)as we can see. There are in total 4,139 implicit missing observations. Lets investigate the distribution between each Sensor!
ped_gaps <- tsbl_pedestrian %>%
count_gaps(.full = TRUE)
ped_gapswhich Sensor has the most missing observations?
ped_gaps %>%
group_by(Sensor) %>%
summarise(Sensor = Sensor,
Total_missing = sum(.n)) %>%
distinct()with NAs
ped_full <- tsbl_pedestrian %>%
fill_gaps(.full = TRUE)
ped_full %>%
filter(is.na(Count))all previously identified 4,139 observations now have NAs!
But what if we want to avoid NAs in order to conduct certain operations.
we can also fill the gaps with zeros or any other given value
tsbl_pedestrian %>%
fill_gaps(Count = 0L, .full = TRUE) %>%
filter(Count == 0L)we can see that there are 8 more rows than expected. This is probably due to some original observations where Count == 0
We can also fill the gaps using a function. For example, we might consider it appropriate to fill the gaps with the mean count of the sensor!
tidy_pedestrians <- tsbl_pedestrian %>%
group_by_key() %>%
fill_gaps(Count = mean(Count), .full = TRUE)daily_ped <- tsbl_pedestrian %>%
group_by_key() %>%
index_by(date = ~ as.Date.POSIXct(.)) %>%
summarise(
ped_mean = mean(Count, na.rm = TRUE)
)
daily_ped %>% arrange(desc(ped_mean))daily_ped %>% filter(Sensor == "Birrarung Marr") %>% autoplot()tourismWe are interested in the total amount of trips
tourism_t_trips <- tourism %>%
summarise(Trips = sum(Trips))
tourism_t_tripstourism_t_trips %>%
autoplot(Trips)What can we see? There is seasonality and a negative trend that turns positive after 2010. As experienced forecasters, we know that under such conditions, we should choose a exponential smoothing model.
fit <- tourism_t_trips %>%
model(
ets = ETS(Trips),
arima = ARIMA(Trips),
lm = TSLM(Trips ~ trend() + season())
)
fit %>%
forecast(h = "2 years") %>%
autoplot(tourism_t_trips, level = 80, alpha = 0.5)tourism_t_trips %>%
# Withhold the last 3 years before fitting the model
filter(Quarter < yearquarter("2015 Q1")) %>%
# Estimate the models on the training data (1998-2014)
model(
ets = ETS(Trips),
arima = ARIMA(Trips),
lm = TSLM(Trips ~ trend() + season())
) %>%
# Forecast the witheld time peroid (2015-2017)
forecast(h = "3 years") %>%
# Compute accuracy of the forecasts relative to the actual data
accuracy(tourism_t_trips)